Math 560 Homework 08: Langrange Multipliers --------------------------------------------------------- 1 Consider the electricity-generation problem that we used, min 3x^2 + 2y^2 s.t. x+y>=4, all variables non-negative. In this problem, you'll be solving various versions of the NLP. I recommend using a separate sheet for each, so you can keep a record of your work. a) Find the solution using Excel Solver, and ask for a Sensitivity Report. Then draw the situation at optimality, involving the obj.func. gradient vector, the constraint gradient vector, and the constraint gradient vector scaled by the Lagrange Mult. b) Change the demand constraint to >=4.1 and re-solve. c) Compare the results to what was predicted by the Lagrange multiplier(s?) in part (a) d) Going back to >=4, also enforce x=0 and re-solve, again getting a Sensitivity Report. Draw a diagram, as in part (a). e) Considering that part (d)'s x=0 constraint was originally an x>=0 constraint, what is the Lagrange multiplier telling you about being on the x>=0 constraint boundary? Explain. f) Formulate part (a) for Wolfram Alpha by asking it for the stationary points of the Lagrangian function. We'd like to call the Lagrange multiplier "lambda", but it doesn't usually understand that, so call the Lagrange multiplier "z". Something like this: stationary points of (some expression involving x, y, z) Check that you get a reasonable result compared to your answer to part (a) g) Repeat (f) for the problem in part (d) with the x=0 constraint enforced. You will need 2 Lagrange multipliers; I suggest you call them z and w. --------------------------------------------------------- 2. Read the article "Economic Congestion Relief Across Multiple Regions Requires Tradable Physical Flow-Gate Rights" at http://people.emich.edu/aross15/e/electricity-dualization.pdf and the questions/responses from a previous year's student, included below in this homework assignment. Or, if you prefer some other paper involving Lagrange multipliers, let me know and you can maybe use that other paper. Using Excel or Python or Matlab/Octave (I recommend Excel), duplicate the NLP shown at the bottom-left of page 3 of the document, using the values shown in Figure 1; you may check your work against the results shown in Figure 4. You will need to get the Cost functions from Figure 1. Do you really need to have the q_i variables as official Decision Variables that you tell Solver about, along with their node-flow constraints? On Figure 4, draw: the constraint gradient vectors at optimality, the constraint gradient vectors multiplied by their Lagrange multipliers, their vector sum, and the objective function gradient vector. If you need to scale them all by some constant to fit on the graph, go ahead. Also, write down any questions or thoughts you have about the article. See below for a past student's questions and my answers. Optional: also solve the partially dualized problem in Excel or other software. --------------------------------------------------------- Optional: Consider a chain with N links hanging from (a,b) to (c,d). Link #i has a weight of w_i and a length of L_i. Formulate an NLP that will find the shape that the chain settles into, by minimizing the total gravitational potential energy. Part of the fun of this problem is seeing what solutions the solver settles on that are physically unlikely--if you see any of those, please include a graph and comment on it. a) Solve for N=20, with all links having the same weight and same length (1 gram and sqrt(2) cm). The left anchor point is at (-10,+10) and the right anchor point is at (+10,+10). Show your formulation and a graph of the resulting chain. b) Re-solve for N=40 links, each half as long as in part (a). Do this chain correspond with the chain in part (a) at every second link junction? c) Let's put a floor at y=5, that the chain may not go below. Re-solve part (a). Comment if anything interesting happens in the final chain. d) (optional) does either chain (a) or (b) correspond with the continuum limit of the catenary curve, which is related to hyperbolic-cosine? e) (optional) Figure out some other cool thing to try, like changing the weights or lengths of individual links, or other sorts of floor constraints, or whatever else you dream up. --------------------------------------------------------- Questions/Responses on the Congestion electricity article mentioned above: 1) What’s a bilateral transaction? It's a contract between a producer of electricity and a particular (large) consumer, like an aluminum refinery or steel recycler. It says something like "I will supply you with 100 MW from 8am to 6pm on this particular day, at this particular price." This is as opposed to generators just selling their power to the general demand pool, or retail customers like you and me just buying it from our power lines. The difficulty of bilateral contracts is that the power has to go from a particular place to another particular place--there isn't much flexibility, and you hope there's enough capacity on the major transmission lines. 2) What’s the PTDF (Power Transfer Distribution Factor), and what is it used for? It's a number between 0 and 1, and it quantifies how electric power doesn't take just a single path through the network. For example, if you're trying to send 100 MW of power from the nuclear plant Fermi II (in Monroe County, MI) to Ann Arbor, maybe 80 MW of that power will go along a transmission line from Romulus to Belleville, and 1 MW will go along a transmission line from Toledo to Saline, etc. So, the PTDF on the Romulus-Belleville line is 80% for that transaction, and on the Toledo-Saline line is 1%. You need to know the PTDFs to figure out the effect of any proposed change, to avoid overloading any of the lines. 3) What’s counterflow? Suppose you want to send 100 MW on a line from A to B that has a capacity of only 95 MW. If you can find someone else who wants to send 5 MW from B to A, then you've found a way to do what you want without melting the line. That 5 MW from B to A is counterflow. 4) What is meant by the sentence, “Revenues from the sales of counter-flow rights will subsidize out of merit generation that helps relieving congestion”? "Merit order" means "beginning with the cheapest bid and progressing to more expensive bids" to supply power. It's the ideal, but sometimes we can't do that. For example, suppose you have a very cheap generator but the nearby lines have a low capacity; you'll have to run other, more expensive generators instead ("out of merit generation"="out of merit-order") to relieve the congestion. That's going to cost extra money--where will that money come from? "sales of counter-flow rights". 5) What’s economic dispatch? Figuring out which generators to run at which output levels to satisfy demand at the lowest possible cost, without overloading any transmission lines. Economic=at lowest cost, dispatch=telling people what to do (e.g. dispatching police cars) 6) Wouldn’t an AC model be better than a DC model, since the transmission is of AC? (Or is it?) Almost all transmission is AC, because it's much easier to transform up to high voltages (hundreds of kilovolts) for long-distance transmission and then back down to lower voltages (e.g. 4800 volts) for neighborhood distribution, and ultimately 220 volts for house power. This is done because the higher the voltage, the lower the current (at the same power level), and the lower the current, the lower the power lost to resistance in the lines. This was a big debate between Thomas Edison and George Westinghouse back in the early days of commercial power; high voltage (AC) is cheaper than low-voltage (DC) because you lose less power in transmission, but high voltage is more dangerous (obviously). Actually, recent advances in electronics have made it possible to convert DC to AC without huge losses, so there are now high-voltage DC transmission lines in some places. Anyway, I agree that AC models are better than DC models. What we're saying is: if you build your policy on DC models (like CHHP does), you may be in for a surprise when you implement your policy in the real world, which uses AC models. When we said "Such possible mishaps can occur when a control area operator uses an AC model (rather than a DC approximation)", we didn't mean that AC models are bad. Perhaps we should have said: "Such possible mishaps can occur when a control area operator uses a realistic AC model (rather than a DC approximation, as CHHP proposes)." 7) What other types of problems might partial dualization be valuable in? As we said in class later, partial dualization = Lagrangian Relaxation, and it's used in many fields where you have some constraints that are relatively easy to deal with, and some that are a pain in the neck. You dualize (=move into the objective function) just the pain-in-the-neck constraints, and try to find multipliers that are roughly what their Lagrange multipliers would have been if you had left them as constraints. 8) What’s the difference between real power and reactive power? If reactive power isn't REAL power, it must be: IMAGINARY (sqrt(-1)) power! I'm actually not kidding. Resistive loads (toasters, electric heaters or blankets, old incandescent light bulbs) consume real power, according to the equation V=I*R (voltage=current * resistance). At any point in the AC power cycle (which happens at 60 cycles per second), the current in a resistive load is directly proportional to the voltage, with the constant of proportionality being the resistance. That is, the voltage and current are exactly in phase. However, things like capacitors and inductors don't behave that way. An inductor puts up a fight against any change in current. Imagine water moving in a long pipe: if you all of a sudden increase the water pressure, it will take a little while for the current (=amount of water flowing) to change in response, due to the inertia of the existing water. So, the peak of the current comes a little after the peak of the pressure (=voltage); current and voltage are not exactly in phase. Capacitors do the opposite: there, current peaks before voltage. In either case, the product of Volts*Amps, which is ordinarily the Power (1 volt*1amp = 1 Watt) is not quite the power that's actually being consumed; the difference is the reactive power. Electric motors often act somewhat like inductors, so your furnace fan, refrigerator, and air conditioning all require some reactive power. Reactive power doesn't do any useful work, but it still has to be "supplied" to keep the system in balance. Interestingly, the devices that convert DC from solar panels to AC (which are called "inverters") can create just about any mix of real and reactive power, which is a fairly new advance. Also, while compact fluorescent lights use much less real power than incandescents, they require some reactive power while incandescents require basically none. So anyway, because AC involves sine waves being in and out of phase with each other, it's often easiest to express things in complex-polar notation, exp(a+j*b) where j=sqrt(-1) because electrical engineers reserve "i" for current (don't ask me why). You can see why we chose to ignore reactive power in our paper--it wouldn't affect the main conclusion. 9) Is "convexifying" a real word? ? If so, is "concavifying" a word? "Convexifying" is a perfectly cromulent word. Concavifying is only acceptable in countries south of the equator, where what they're really doing is convexifying, but it looks upside-down so in 1961 treaty the two hemispheres agreed that concavifying is an acceptable word (notice that 1961 looks the same to both hemispheres.) 10) Why does adding a quadratic penalty term convexify the dualized objective function? This is the "augmented Lagrangian" method that I mentioned in class the other day. The ordinary Lagrangian is f(x) + lambda^T * h(x) and as we've said, it quite often has a saddle point instead of a local minimum at the site of the original minimum (to the non-dualized problem). We augment by adding a term that penalizes us quadratically for any straying from the constraints. In the simple version, where h(x) is the set of _equality_ constraints (ie h(x) = 0), a measure of how far we are from the constraints is just h(x) itself. So, our augmented Lagrangian is f(x) + lambda^T * h(x) + nu * (h(x))^T * h(x) where the "nu" is a variable that tells us how much to penalize ourselves for violating the constraints (we will want nu>0, of course). Since the quadratic part is convex, if we add a big enough version of it (ie make nu big enough) to the rest of the Lagrangian, we will get a locally convex function. 11) What does block-separable mean? The variables can be broken down into subsets, where each subset appears in its own little subset of constraints, and not any of the others. 12) Which of the methods we discussed in class work for problems that are not convex, but are "locally convex near the feasible region"? Pretty much all of them, as long as you start close enough to the optimal solution. 15) What program was used to generate the graphs in the paper? They look cool. The graphs were generated with "gnuplot". The non-graph figures were generated with "xfig", which was originally a unix-only program but now there are windows versions (winfig and jfig). If I had it to do again, I would probably do the graphs in Matlab instead of Gnuplot. 16) What’s an energy adjustment bid? It's about paying someone (or quoting a price you'd like to be paid) to adjust the amount of energy or power you're using, to avoid overwhelming a transmission line. 17) In reference to the statement, "Our example … was not easy to find," how long did it take to find it? I spent a few weeks (or months?) trying to come up with an example by hand, or even trying to write an optimization problem that would produce "the optimal counter-example". When that failed, I said "heck, let's just try random parameters" and that found a counter-example on the 47th trial. That one was a trivial counter-example (I think it was based on the values repeating every 2*pi units, which can be dealt with easily), but within a thousand more trials I think I found the one you see in this paper. The way the trials went was this: 1. generate cost and transmission parameters randomly 2. solve the original Optimal Power Flow problem, get Lagrange multipliers 3. dualize using those Lagrange multipliers, re-solve, see if new answer is infeasible for non-dualized problem. 4. repeat at step 1. I can send you the AMPL code that did this if you like. 18) What does it mean for something to be "distributed via the NERC process"? (Reference 1) NERC is like Congress or IEEE standards-setting bodies; they have to make a proposal of a new standard, put it out there for discussion, see what people think, etc. That stuff isn't done in established academic journals. Originally it was probably done by photocopying and mailing; these days it's almost certainly web-based. (PS: if you don't know what "cromulent" means, google it.)